The Interplay of Syntax and Morphology in Building Parsing Models for Modern Hebrew

نویسنده

  • Reut Tsarfaty
چکیده

As of yet, there is no statistical parser for Modern Hebrew (MH). Current practice in building parsing models is not immediately applicable to languages that exhibit strong interaction between syntax and morphology, e.g. Modern Hebrew, Arabic and other Semitic languages. We suggest that incorporating morphological and morphosyntactic information into the parsing model is essential for parsing Semitic languages. Using a morphological analyzer, a part-of-speech tagger, and a PCFG-based general purpose parser, we segment and parse unseen MH sentences using a small annotated corpus. The Parseval scores obtained are not comparable to those of, e.g., state-of-the-art models for English, due to remaining syntactic ambiguity and limited morphological treatment. We conjecture that adequate morphological and syntactic processing of MH should be done in a unified framework in which morphology and syntax can freely interact and share information in both directions.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Integrated Morphological and Syntactic Disambiguation for Modern Hebrew

Current parsing models are not immediately applicable for languages that exhibit strong interaction between morphology and syntax, e.g., Modern Hebrew (MH), Arabic and other Semitic languages. This work represents a first attempt at modeling morphological-syntactic interaction in a generative probabilistic framework to allow for MH parsing. We show that morphological information selected in tan...

متن کامل

BUILDING A HEBREW TREE-BANK Building a Tree-Bank of Modern Hebrew Text

This paper describes the process of building the first tree-bank for Modern Hebrew texts. A major concern in this process is the need for reducing the cost of manual annotation by the use of automatic means. To this end, the joint utility of an automatic morphological analyzer, a probabilistic parser and a small manually annotated tree-bank was explored. An initial tree-bank that consists of 50...

متن کامل

Modeling Morphosyntactic Agreement in Constituency-Based Parsing of Modern Hebrew

We show that naı̈ve modeling of morphosyntactic agreement in a Constituency-Based (CB) statistical parsing model is worse than none, whereas a linguistically adequate way of modeling inflectional morphology in CB parsing leads to improved performance. In particular, we show that an extension of the Relational-Realizational (RR) model that incorporates agreement features is superior to CB models ...

متن کامل

An improved joint model: POS tagging and dependency parsing

Dependency parsing is a way of syntactic parsing and a natural language that automatically analyzes the dependency structure of sentences, and the input for each sentence creates a dependency graph. Part-Of-Speech (POS) tagging is a prerequisite for dependency parsing. Generally, dependency parsers do the POS tagging task along with dependency parsing in a pipeline mode. Unfortunately, in pipel...

متن کامل

تأثیر ساخت‌واژه‌ها در تجزیه وابستگی زبان فارسی

Data-driven systems can be adapted to different languages and domains easily. Using this trend in dependency parsing was lead to introduce data-driven approaches. Existence of appreciate corpora that contain sentences and theirs associated dependency trees are the only pre-requirement in data-driven approaches. Despite obtaining high accurate results for dependency parsing task in English langu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006